Log-linear Models for Uyghur Segmentation in Spoken Language Translation
نویسندگان
چکیده
To alleviate data sparsity in spoken Uyghur machine translation, we proposed a log-linear based morphological segmentation approach. Instead of learning model only from monolingual annotated corpus, this approach optimizes Uyghur segmentation for spoken translation based on both bilingual and monolingual corpus. Our approach relies on several features such as traditional conditional random field (CRF) feature, bilingual word alignment feature and monolingual suffixword co-occurrence feature. Experimental results shown that our proposed segmentation model for Uyghur spoken translation achieved 1.6 BLEU score improvements compared with the state-of-the-art baseline.
منابع مشابه
Rule Based Analysis of the Uyghur Nouns
This paper describes the implementation of a rule-based analyzer for Uyghur (spoken in Sin Kiang, China) Nouns. We hope this paper will give some contribution for advanced studies to the Uyghur Language in Machine Translation and Natural Language Processing. Like all Turkic languages, the Uyghur Language is an agglutinative language that has productive inflectional and derivational suffixes. In...
متن کاملUyghur Language Model with Graphic Structure
This paper describes a novel agglutinative language modeling strategy for Uyghur with graphic language model as structure. In graphic modeling language model, sentences are organized by morphemes as a directed graph, which is different from the linear structure in n-gram language models. The graphic language model is verified in two typical natural language processing application scenarios, mor...
متن کاملDialect Translation: Integrating Bayesian Co-segmentation Models with Pivot-based SMT
Recent research on multilingual statistical machine translation (SMT) focuses on the usage of pivot languages in order to overcome resource limitations for certain language pairs. This paper proposes a new method to translate a dialect language into a foreign language by integrating transliteration approaches based on Bayesian co-segmentation (BCS) models with pivot-based SMT approaches. The ad...
متن کاملDiscriminative Learning of Feature Functions of Generative Type in Speech Translation
The speech translation (ST) problem can be formulated as a log-linear model with multiple features that capture different levels of dependency between the input voice observation and the output translations. However, while the log-linear model itself is of discriminative nature, many of the feature functions are derived from generative models, which are usually estimated by conventional maxim...
متن کاملA WFST-based log-linear framework for speaking-style transformation
●Objective: Transform spoken-style language (V) into written style language (W) for the creation of transcripts ●Approach: Statistical machine translation to “translate” from verbatim text to written text ●Innovations: ●Log-linear modeling for improved accuracy ●Introduction of features to handle common phenomena in speaking-style transformation ●WFST-based implementation for integration with W...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2017